-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add celery broker configuration example using Amazon SQS #24019
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
We had reports that |
I also checked |
The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
# Celery's environment variables. | ||
export AWS_ACCESS_KEY_ID=your_aws_access_key_id | ||
export AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key | ||
export AWS_DEFAULT_REGION=your_aws_region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if will be great if you can also mention this setting #21507
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add environment variable AIRFLOW__CELERY__WORKER_ENABLE_REMOTE_CONTROL=true
but new queues are not created.
@@ -26,6 +26,17 @@ to work, you need to setup a Celery backend (**RabbitMQ**, **Redis**, ...) and | |||
change your ``airflow.cfg`` to point the executor parameter to | |||
``CeleryExecutor`` and provide the related Celery settings. | |||
|
|||
If you use Amazon SQS as a Celery broker, set environment variables like this: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a note that this is experimental support/use at own risk. It would be great also to add something along "there are anecdotal evidences that Amazon SQS" might be used as the broker backend.
We do not run any tests for SQS as celery backend (as opposed to rabbitmq and redis - we have integration tests for both).
We do not want to give the users false impression that they can expect it to work and ask as question "I followed the documentation and it does not work" - unless we have CI/tests covering the scenario we are talking about.
Or maybe @ferruzzi @o-nikolas @john-jac - there could be some blog post or documentation from Amazon team to describe it. I believe MWAA uses SQS as broker and maybe one of you (or someone from the amazon developer relations team) coudl write a blog post on "How to use SQS as celery broker for Airflow? " that woudl be nice - and then any questions about it could come to Amazon- I am afraid until we make system tests/integration with Amazon, making a statement that "SQS works and here is the configuration" in Airlfow docs is a little too much "leap of faith" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also - this is also important because it is not enough to just have those configs setting. See here #22863 (comment) - there area also some other things you need to have in airflow to make use of SQS - like installing celery with sqs extra and some additional prerequisites.
If we "claim" that Airlfow support SQS but we have no tesing and it does not work by default using "official" image of Airlfow, it opens a "floodgate" for people who would like to get support in their case because "we claimed it works" without actually providing them instructions how to do it.
I am more convinced looking at the discussion in #22863 (comment) that if we are to mention SQS at all, it needs to be backed by someone who will take the ownership and supoort of people in case it does not work for them.
Hmm. I thought a bit about it. And following the dicsusion in #25121 - I think we should not add "offficial documenation" to features that our users tested, but we have no automated tests for and do not know ourselves what might be potential risks, problems etc. I think this falls in the same camp. @BasPH @jedcunningham @feluelle - WDYT? I think @tsugumi-sys that a better approach would be that you (for example) write a short blog post about using SQS as a broker, and describe your experiences it - and then we can even add it to "Airlfow" publication on Medium and even link it from the Ecosystem page of Airflow. https://airflow.apache.org/ecosystem/ I think adding it to the official docs in experimental phase when we do not have any tests (nor intention to add them) is IMHO something we should not do - even if we add "Experimental" - because "experimental" means that we are planning to "run" some experiment and turn it into a "full feature" (or drop) when we finish the experiment. This is what happens with:
In both cases we already have tests in CI and we actively plan to act and gather observations/. For example I think we might be inclined to drop MsSQL if the Survey results will not show significant uptake in MsSQL usage after it's been officially "experimentally" supported in 2.3 https://airflow.apache.org/blog/airflow-survey-2022/ shows a very little usage so far. Adding information that SQS "works" in the official docs is some kind of statement we make to our users and they might have certain expectations if they follow it. I think I'd be only inclined to add "experimental" SQS support if we have some tests in our CI making sure that it works (BTW. We have such - simple but regularly running tests for RabbitMQ and Redis). Others - WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clarify if we REALLY want to have such statement in our docs.
I think it's perfectly reasonable to not want it in the docs if we can't back up the claim with testing. |
I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle 🤷 |
Actually that might change my view. I did not realize SQS is on-par with RabbitMQ/Redis for Celery. On the other Hand MAYBE this is a GREAT opportunity to add a test or two using Moto ? https://qxf2.com/blog/mocking-out-aws-sqs-using-moto/ Then we would be completely covered and it shoudl be no problem to make it official. @ferruzzi @o-nikolas @vincbeck ? WDYT> |
FYI. @jedcunningham - those are the kind of problems we might expect #25486 . I feel rather uncomfortable not having any tests around it. I am not sure even how we would approach diagnosing and fixing similar issues if we do not have any tests like that. There might be many reasons for this kind of failures (celery/sqs incompatibilities, our way of integrating with celery etc. |
Predefined queues, or any other broker transport options can be configured by a plugin to get around the string vs object discrepancy. |
I have a love/hate relationship with using SQS as a celery broker :) (maybe this is for a blog post...) As a tl;dr: it works, but has a lot of gotchas, and I wouldn't necessarily recommend it generally to airflow users. SQS is serverless, scales great, and is relatively fast. But it doesn't actually have the same level of support as RabbitMQ and Redis have as a Celery broker. Namely the Monitor and Remote Control APIs are not supported, so while it is stable, it has far less support. Also SQS has the visibility timeout architecturally built into it, which means you can't have tasks that run longer than that timeout without that causing issues (max value for that timeout is 12 hours). I still support adding automated testing for SQS though, I think that's a worthwhile task. |
That comment from @o-nikolas make me think we are NOT ready to make it as an official part of our documentation. But as a blog post was mentioned several times - if either of you would like to write it, I think we can publish it in https://medium.com/apache-airflow |
I think we can close it now. We can always reopen if we think it is worthwile, but I'd bee rather sceptical on supporting SQS knowing the caveats. |
FYI: I think that was a good idea to not "pretend" we support SQS when we have no tests for it. An example of interesting problem with it is here for example: #26521 |
Added because there is no clear configuration example using Amazon SQS as Celery broker #22863.
Note: Dictionary cannot be used in airflow.cfg because configparser cannot parse it ?. So you cannot set
celery_broker_transport_options
like this.Close #22863
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named
{pr_number}.significant.rst
, in newsfragments.